Resource Provisioning Framework for MapReduce Jobs with Performance Goals
نویسندگان
چکیده
Many companies are increasingly usingMapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users’ responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, the job profile is built from the job past executions or by executing the application on a smaller data set using an automated profiling tool. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10% of the measured times in our 66-node Hadoop cluster.
منابع مشابه
SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs
( LADIS'2011), held in conjunction with VLDB'2011, Seattle, Washington, Sept. 2-3, 2011. SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell HP Laboratories HPL-2011-126 MapReduce; Hadoop; performance models; completion time prediction; resource allocation There is an increasing number of MapReduce applications, e.g., persona...
متن کاملDynamically Scheduling a Component-Based Framework in Clusters
In many clusters and datacenters, application frameworks are used that offer programming models such as Dryad and MapReduce, and jobs submitted to the clusters or datacenters may be targeted at specific instances of these frameworks, for example because of the presence of certain data. An important question that then arises is how to allocate resources to framework instances that may have highl...
متن کاملBig Data Using Hadoop
17ANSP-BD-001 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementationof MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as AmazonEC2 Cloud offer the opportunities for Hadoop users to lease a certain amou...
متن کاملBringing Elastic MapReduce to Scientific Clouds
The MapReduce programming model, proposed by Google, offers a simple and efficient way to perform distributed computation over large data sets. The Apache Hadoop framework is a free and open-source implementation of MapReduce. To simplify the usage of Hadoop, Amazon Web Services provides Elastic MapReduce, a web service that enables users to submit MapReduce jobs. Elastic MapReduce takes care o...
متن کاملQoS-Based Pricing and Scheduling of Batch Jobs in OpenStack Clouds
The current Cloud infrastructure services (IaaS) market employs a resource-based selling model: customers rent nodes from the provider and pay per-node per-unit-time. This selling model places the burden upon customers to predict their job resource requirements and durations. Inaccurate prediction by customers can result in over-provisioning of resources, or under-provisioning and poor job perf...
متن کامل